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Abstract 

Validation of image segmentation methods is of critical 
importance. Probabilistic image segmentation is increas¬ 
ingly popular as it captures uncertainty in the results. Im¬ 
age segmentation methods that support multi-region (as op¬ 
posed to binary) delineation are more favourable as they 
capture interactions betw’een the different objects in the im¬ 
age. The Dice similarity coefficient (DSC) has been a pop¬ 
ular metric for evaluating the accuracy of automated or 
semi-automated segmentation methods by comparing their 
results to the ground truth. In this work, we develop an 
extension of the DSC to multi-region probabilistic segmen¬ 
tations (with unordered labels). We use bipartite graph 
matching to establish label correspondences and propose 
two functions that extend the DSC, one based on absolute 
probability differences and one based on the Aitchison dis¬ 
tance. These provide a robust and accurate measure of 
multi-region probabilistic segmentation accuracy. 


1. Introduction 

Rigourous validation of automated and semi-automated 
image segmentation methods is of undeniable importance. 
Aside from a few exceptions m ei Ei. the most common 
validation approach in image analysis has been to compare 
the automated results to “ground truth” data, i.e. segmenta¬ 
tions obtained by expert users or through physical or math¬ 
ematical phantoms. Thus, evaluating the accuracy of image 
segmentation results is typically carried out using the well- 
known Dice similarity coefficient (DSC) a or other met¬ 
rics such as the Hausdorff distance or the Jaccard index 0. 
Even the evaluation of image registration results is increas¬ 
ingly being done by evaluating the accuracy of atlas-based 
segmentation 0, in which the DSC is commonly used. 

Numerous sources of uncertainties exist in shape bound¬ 
aries 0, including region heterogeneity (in medical imag¬ 
ing), image acquisition artifacts (e.g. blurring), and seg¬ 
mentation by multiple-raters. In the past decade, there has 


Ghassan Hamameh 
Simon Fraser University 

hamarneh@sfu.ca 


been a notable focus on encoding uncertainty in the segmen¬ 
tation results and not ignoring these uncertainties in sub¬ 
sequent analyses and decision-making 01- In order to 
capture uncertainty information about the location of mul¬ 
tiple structures in the same image, there have been several 
works on creating segmentation representations for multi¬ 
region probabilistic segmentations (9), e.g. using hyper- 
spherical labels ED, the LogOdds DU of signed distance 
maps (SDMs) fT2l . (barycentric) label-space M1 31 [17 1, and 
isometric log-ratio maps MM- Several segmentation al¬ 
gorithms were also designed to use these representations to 
output probabilistic segmentations, both binary MM and 
multi-region EH- Further, fuzziness and uncertainty have 
been incorporated in other image processing and pattern 
recognition methods, e.g. fuzzy distance transforms l20l 
and moments ED- Speaking to the importance of handling 
probabilistic data, the visualization community has identi¬ 
fied uncertainty visualization as a key problem in the field 
ll22l l23l 1241 m . In image registration and shape match¬ 
ing, uncertainty calculation, visualization and utilization 
has also been increasingly popular 1 261 [27] [28] [29 3. 

When validating a multi-region automated segmentation, 
be it probabilistic or crisp (non-probabilistic), a major set¬ 
back is that there may be no guarantee that the segmenta¬ 
tion will have the same number of regions as the ground 
truth, much less correctly corresponding labels. For exam¬ 
ple, in POl . Shi and Malik employ a recursive sub-optimal 
approach to segment multiple regions, which entails decid¬ 
ing if the current partition should be further sub-divided 
and then repartitioning if necessary. In a somewhat reverse 
approach, Felzenszwalb and Huttenlocher’s algorithm as¬ 
signs a different label to each vertex, then similar pixels 
are merged using a greedy decision approach PD . These 
methods and many more P21PP do not guarantee a partic¬ 
ular label ordering in the resulting segmentation. Therefore, 
in order to properly validate an automated multi-region seg¬ 
mentation, label correspondences must first be determined 
and over- or under-segmentations handled properly. 

From the previous discussion, it is evident that 
uncertainty-encoding segmentations that accommodate 
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multiple-regions (with unordered labels) is an important 
area. However, there is a lack of published works on meth¬ 
ods for evaluating these multi-region probabilistic segmen¬ 
tation results. The focus of this paper is to develop an exten¬ 
sion of the DSC to multi-region probabilistic segmentation, 
along with a method for establishing label correspondences. 
Specifically, we use bipartite graph matching to establish la¬ 
bel correspondences and propose two functions that extend 
the DSC, one based on absolute probability differences and 
one based on the Aitchison distance lf34l . As demonstrated 
by our results, we provide a robust and accurate measure for 
multi-region probabilistic segmentation accuracy. 


2. Method 


In Section 12.11 we review the classical use of the Dice 
similarity coefficient (DSC) for comparing binary segmen¬ 
tations and discuss the challenges in extending this method 
to multi-region segmentation. In Section 2.2 we introduce 
an alternate method for comparing segmentations using the 
DSC and then show this method extends easily to multi¬ 
region segmentation. In Section 2.3 we introduce two con¬ 
tinuous extensions to the DSC that both allow the compari¬ 
son of multi-region probabilistic segmentations and reduce 
to the discrete DSC when probabilities are crisp (0 or 1). 
Finally in Section 2.4 we propose a method for establishing 
label correspondences using bipartite graph matching. 


2.1. Classical Dice similarity coefficient 

The DSC measures the similarity between two sets, X 
and Y |4| : 


D{X,Y) 


2\X<1Y\ 

\X\ + \Y\ ' 


(1) 


where |X| denotes the cardinality of the set X. D{X 1 Y) £ 
[0,1], with D(X,Y) = 0 if and only if the sets are disjoint 
and D{X 1 Y) = 1 if and only if the sets are identical. 

The DSC has been adapted to image segmentation and 
is a popular method for comparing binary segmentations of 
the same image to each other. Often, the comparison is done 
between the ground truth segmentation and the results of 
automated or semi-automated segmentation methods. 

To measure the DSC between two segmentations, a set 
has to be constructed for each. To start, one region in each 
segmentation is designated the foreground (as opposed to 
the background). If Q is the set of all pixels in the im¬ 
age, then the sets compared with the DSC are .Sj'. Sf 7 C Q, 
where Sf is the set of pixels assigned to the foreground 
by the automated method and Sf 7 is the set of pixels as¬ 
signed to the foreground in the ground truth. D(SfSf 7 ) 
provides a measure of how accurate an automated segmen¬ 
tation result is, with values closer to 1 indicating greater ac¬ 
curacy. As a simple example with 4 pixels, {xi, • • • ,x 4 }, if 


Sf = {xi,X 2 ,xt,} and Sf T = {xi ,* 3 }, then the DSC between 
the automated and ground truth segmentations is found us¬ 
ing 0 : D(SfSf 7 ) = = 5 . 

The above definition of Sf and Sf T is dependent on 
which region is assigned to be the foreground. The fore¬ 
ground is often chosen to be the region of greatest inter¬ 
est, but this choice is not clear for all images, and may be 
dependent on the task requiring the segmentation. Thus, 
when the choice of the foreground region is not clear, the 
DSC suffers from ambiguity as its value differs depending 
on this choice. While not usually problematic for binary 
segmentation, when an image is segmented into multiple 
regions, which region to assign to the background becomes 
less clear, thus we would like to address this issue when 
extending the DSC to compare multi-region segmentations. 

2.2. Similarity coefficient for multi-region crisp seg¬ 
mentations 

Here we propose a method for using the classical DSC 
to compare a multi-region automated segmentation with the 
ground truth. At the same time, we remove the need to spec¬ 
ify foreground and background regions. For now, we as¬ 
sume both ground truth and automated segmentations have 
the same number of regions. Specifically, we will assume 
the regions are labeled from L = {1,... ,L}, where L is the 
number of regions, and that each region in the automated 
segmentation is labeled with the same number as the corre¬ 
sponding region in the ground truth. This assumption will 
be addressed in Section [24] 

Again, we must construct a set for each segmentation. 
Here, we propose the sets SfSf 7 C (Cl x L), that is, sets 
of ordered pairs consisting of a pixel and an integer from 
1 to L. The set 5^ will have, for each pixel, an ele¬ 
ment containing the label of the region that that pixel is 
assigned to in the automated segmentation; Sf 7 will be 
defined similarily for the ground truth. Thus we have 
|St | = Sf = |£2|. By comparing these sets using the 
DSC we get a value from 0 to 1 indicating what fraction 
of pixels share a label in both segmentations. As a simple 
example with 4 pixels, {xi, • • ■ , X 4 } and 2 regions, labeled 
from L = {1,2}, if Sj = {{xi, 1},{x 2 ,1},{x 3 ,1}, {* 4 ,2}} 
and S% 7 = {{xi, 1},{x2,2}, {X 3 ,1},{x 4 , 2 }}, then the DSC 
between the automated and ground truth segmentations is 
found using 0: D(SfS < j T ) = 

For binary segmentation (L = 2) this definition of and 
Sf 7 makes the DSC take the value 1 only when the segmen¬ 
tations are identical, and 0 when none of the pixels have the 
correct (ground truth) label. Furthermore, both this method 
and the method introduced in Section [2~j~| for calculating the 
DSC increase as the number of pixels assigned to the same 
region (foreground or background) in both segmentations 
increases (Figure |TJ». Thus, .S - } and Sf give a comparison 
metric similar to .Vj' and Sf 7 , yet do not suffer from the 
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Figure 1: A comparison of the DSC when using S A and Sf T 
versus using our proposed St and S GT ■ This figure assumes binary 
segmentations. The axes represent the fraction of pixels in the 
foreground and the background in both the automated and ground 
truth segmentations. White corresponds to 1 and black to 0. Note 
that at the point ( 0 , 1 ), corresponding to both segmentations being 
all background and thus identical, the top plot is undefined, and 
varies rapidly near that point. Our approach in the bottom plot, 
however, assigns the value 1 to the point ( 0 , 1 ), correctly indicating 
that both segmentations are the same there. 


aforementioned ambiguity and extend naturally to multi¬ 
region segmentations. 


2.3. Similarity coefficient for multi-region proba¬ 
bilistic segmentations 

When comparing discrete, non-probabilistic segmenta¬ 
tions, similarity is justifiably measured in a discrete way: 
whether or not a pixel is assigned to the same region in 
both segmentations. Thus constructing sets and applying 
the classical DSC <[!}. as in Sections 2.1 and |2.2| accurately 
captures the similarity between segmentations. However, 
when considering continuous, probabilistic segmentations, 


such discrete comparisons are no longer applicable. 

For example, if the ground truth segmentation assigns 
pixel x probability 0.9 of being in region r, then an auto¬ 
mated segmentation that assigns pixel x probability 0.7 of 
being in region r should be considered more similar to the 
ground truth than if it had assigned probability 0 . 6 , but less 
similar than if it had assigned probability 0 . 8 . 

To accurately capture the similarity between multi¬ 
region probabilistic segmentations, we need to extend the 
DSC to a continuous function. We will again assume that 
the automated and ground truth segmentations have the 
same number of regions, labeled from L = { 1,..., L }, with 
corresponding regions labeled the same (violating this as¬ 
sumption is addressed in Section | 2 ~ 4 | . We define the sim¬ 
plex of probability vectors of length L: 


= < [pi,p2,-~,pl\ e: 


Pi >0,i = l,2,...,L;J^pi = 1 > . (2) 


i= 1 


We let p A and p GT represent the multi-region probabilis¬ 
tic automated and ground truth segmentations, respectively. 
p A and p GT assign to each pixel a- regional (or label) proba¬ 
bilities p A (x),p GT (x) £ S L . Our first step will be to define a 
pixel to pixel similarity function f : S L x S L —[0,1], map¬ 
ping two vectors of L regional probabilities to the interval 
[ 0 , 1 ] such that larger values correspond to more similar re¬ 
gional probabilities. Furthermore, / = 1 should imply the 
regional probabilties are identical and / = 0 should imply 
that every region is assigned probability 0 by at least one of 
the segmentations. 

Once we have defined /, we can define a continuous ex¬ 
tension to the DSC, D cts , that compares two multi-region 
probabilistic segmentations: 

D cts (p A ,p GT ) = tttt £ f(p A {x),p GT {x)) . (3) 

xen 


Since / £ [0,1], dividing the summation by Q ensures 
D c,s £ [0,1]. D cts = 1 if and only if / = 1 for all pix¬ 
els and D crs = 0 if and only if / = 0 for all pixels. Note 
that, given our requirements for /, D cts {p A , p GT ) reduces 
to D(S 2 ,S gt ) from ([T]) in the case that all probabilities are 
either 0 or 1 . 

We propose two versions of /, and we will discuss later 
some situations when one is more applicable than the other. 
We define the first version by looking at the absolute value 
of the difference between regional probabilities in the two 
segmentations: 

M P a (x), P gt (x)) = 1 - \ £ \p?(x)-p GT (x)\ , (4) 

z i=i 
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Figure 2: A comparison of f\ and fy for varying regional proba¬ 
bilities at a given pixel and L = 2 regions. The x and y axes rep¬ 
resent the probability of the pixel belonging to the first region in 
the automated and ground truth segmentations, respectively. Note 
how /2 drops off more quickly near the diagonal and drops to 0 
around the boundaries, when one of the segmentations is almost 
certain (see discussion in the text). White corresponds to 1 and 
black to 0. 


where x £ Cl. f\ = 1 if and only if p A (x) = p GT (x). f\ = 0 if 
and only if every region is assigned probability 0 by at least 
one of the segmentations. 


The case of f\ = 0 can be seen by first noting that 
the | pf (x) — p GT (x) I < maxfj?^ (x), p GT (x)), with equality 
holding only when one of the probabilities is 0. Thus, if 


neither of the probabilities are 0 for some region. 


E pf M + E p^ 7 M = 2 = E pf M + p? t M ( 5 ) 

i= 1 i— 1 i= 1 

>Y j max(pf{x),p GT {x)) ( 6 ) 

i= 1 

>'L\pfw-pf T ( x ')\ (7) 

i=l 

= - 2 (/!( x )- 1 ), ( 8 ) 

i.e. f\ > + 1, which implies f\ > 0. When all regions 

are assigned probability 0 by at least one of the segmenta¬ 
tions, pf(x) + pf T (x) = | irf(x) — pf T (x) | and thus f\ = 0. 

Substituting f\ for / in ([3]) gives a function Df s that ex¬ 
tends the DSC to multi-region probabilistic segmentations 
and reduces to the discrete DSC from Section 12721 when all 
probabilities are either 0 and 1. 

Although f\ achieves our goal, the space S L is a Hilbert 
space and as such has an inner product defined on it. This 
inner product induces a distance function d a : (§ L ) 2 —> 1R + , 
known as the Aitchison distance l34l : 


d a (p,q) 


L 


E 




i 

i 


(9) 


where p.q £ S L with components pi and qi, and p g is the 
geometric mean. We can make use of the Aitchison dis¬ 
tance to create another version of /, denoted fo, that utilizes 
this more natural way to compare probability vectors. Now, 
since d a is a distance function, it is 0 when the probability 
vectors being compared are identical, and approaches °° as 
the probability vectors become maximally different. Thus, 
we define 


fi{p A {x),p GT (x)) 


1 

1 +d [ ,(p A (x),p GT (x)) ’ 


( 10 ) 


where x E D. Note that /2 = 1 when d a = 0 and /i —>■ 0 as 
d a —> and also that 


lim af«([a, 1 — a,0,...], [1 — a,a,0,...]) = °° (11) 

a->l 

=► / 2 ([ 1 , 0 , 0 ,...], [ 0 , 1 , 0 ,...])= 0 , ( 12 ) 


for [a, 1 — a, 0,... ] e E L . Substituting /2 for / in 0 gives 
a function /)"' that, as \ extends the DSC from Section 


2.2 and reduces to it for discrete segmentations. 

Thus we have two functions, Df s and Dj ' that extend 
the DSC to multi-region probabilistic segmentations. To 
compare Df s and D Gs , we compare f\ and /i at a pixel x. 
We will hold the regional probabilities of the ground truth, 
p GT (x), fixed and consider changing the regional probabil¬ 
ities of the automated segmentation, p A (x). The values of 
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Segmentation 1 f\: DSC = h : DSC = 
0.5807 0.5707 



Segmentation 2 f \: DSC = fi' DSC = 
0.8339 0.8429 


Figure 3: (Colour Figure) An example showing f\ and /j at each 
pixel when comparing the top left segmentation (Ground Truth) 
to each of the other segmentations (1 and 2). In the segmenta¬ 
tions, the RGB values correspond to the probabilities of the three 
regions. In the figures for f\ and / 2 , white corresponds to 1 and 
black to 0. 


f\ and fi at a single pixel with varying probabilties is seen 
in Figure [ 2 ] As p A (x) changes, f\(x) varies linearly with 
the individual probabilities, while fi{x) varies more rapidly 
when p A (x ) is close to p GT (x). However, due to the nature 
of the Aitchison distance, f 2 {p A (x) ,p GT (x)) = 0 if either 
p A (x) or p GT (x) is certain (i.e. contains a 0 or a 1). This 
behaviour is reasonable when we consider that having abso¬ 
lute certainty in a pixel’s label (even from manual segmen¬ 
tation) is arguably unachievable in reality (see Cromwell’s 
rule (351). 

Thus in applications when the automated segmentations 
are likely to be close to the ground truth, will be more 
sensitive to small differences, but in applications that are 
likely to have many completely certain pixels, Df s will bet¬ 
ter capture segmentation differences. We see a comparison 
of f\ and /2 applied to two probabilistic segmentations in 
Figure [3] 

To analyze the performance of our method, we carry out 
the following user study: Twenty five people (unaware of 
the purpose of the study) were given a random ordering of 
7 incorrect segmentations of the same image and the cor¬ 
responding GT segmentation, and asked to rank the 7 in¬ 
correct segmentations from most to least similar to the GT, 
where 1 indicates most similar and 7 least similar. The in¬ 
correct segmentations were generated using blurring, spa¬ 
tial deformations, and noise on the images in Figure [5] 5 of 
the sets of segmentations and their values of D c ^ s and D" s 
with respect to the GT are seen in Figure [6] In Figure [7] we 
see the results of the survey, with each point correspond¬ 
ing to an incorrect segmentation, the x-axis corresponding 
to the average human ranking, and the y-axis corresponding 
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Figure 4: (Colour Figure) An example showing how the bipar¬ 
tite graph is constructed and edge weights calculated. The edge 
between two nodes is highlighted in red and the auxiliary binary 
segmentations for those two nodes is shown. These auxiliary seg¬ 
mentations are compared to find a weight for the edge. 
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Figure 5: All 7 images deformed and used in the user study sum¬ 
marized in Figure [7] 


to the ranking given by the proposed DSC. The line of best 
fit through the data has a slope of 0.82, indicating a strong 
correlation. Furthermore, Pearson’s correlation coefficient 
between the DSC rankings and the average human rankings 
was 0.72. 


2.4. Finding label correspondences via graph 
matching 

There is no guarantee that an automated multi-region 
segmentation method will label its regions in a way that cor¬ 
respond to the labels of the ground truth’s regions. This cor¬ 
respondence, however, is required to compare the segmen¬ 
tations using the DSC and extensions described in Sections 


2.2 and 2.3 In fact, if L\ is the set of region labels from 
the automated segmentation and To is the set of region la¬ 
bels from the ground truth, there is no guarantee that even 
\L\ \ = \Lo\. We require a way of establishing a correspon¬ 
dence between the regions in the two segmentations. 

In this section we assume the segmentations are proba¬ 
bilistic, as these are a superset of discrete segmentations and 
Section [231 extends the discrete methods of Section [Z2l to 





















Df = 1 Df = 0.7993 Df = 0.6307 Df = 0.8095 Df = 0.6748 Df = 0.9022 Of = 0.7584 Of = 0.4580 

Of=l Of = 0.5160 Of = 0.3789 Of =0.6258 Of = 0.5099 Of = 0.8699 Of = 0.6822 Of = 0.2852 


Figure 6: A comparison of various ground truth segmentations (left) and example segmentations. The multi-region probabilistic ground 
truth segmentations for five images, both real and synthetic, are seen on the left. These are compared to segmentations generated from their 
ground truths by blurring the probabilities, spatial deformations, and the addition of noise. The corresponding DSCs. both Df and Df, 
are reported for each segmentation. As expected, both DSCs drop as the segmentation deviates further from the ground truth. 

continuous methods. when they match poorly. Thus, to calculate w,j, we create 

We establish label correspondences by constructing a an auxiliary binary segmentation, p Al ' t ' l {x), from the auto- 

weighted complete bipartite graph, with L\ = \L \| vertices mated segmentation p A (x ), by treating region i as the fore- 

on the left representing the L\ regions in the automated seg- ground and all other regions in the automated segmentation 

mentation and L 2 = \La\ vertices on the right representing as the background. Specifically, the probabilities of at 

the L 2 regions in the ground truth. For each pair of re¬ 
gions (i. j) £ {L\ x Ln) we will assign a weight w ( j to the 
edge between their corresponding vertices. We want w,j 
to be smaller when regions i and j match well, and larger 
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Figure 7: A comparison of average human rankings of the 49 
incorrect segmentations (see Figure |6j and the DSC ranks. We 
see a strong correlation between the human rankings and the DSC 
rankings, with Pearson’s correlation coefficient of 0.72. 

a pixel x are given by 

p A l {,} (x)=pf(x) (13) 

P2 {,} {x)= Y, Pe( x ) ■ ( 14 ) 

ieL\ 

t*i 

We create a similar auxiliary segmentation p GT {i} using re¬ 
gion j in the ground truth segmentation. 

We then calculate the DSC between the auxiliary seg¬ 
mentations usint£j]z) crs from (|Tji. Doing so will give a simi¬ 
larity coefficient between regions i and j that is independent 
of the other regions. The edge in the bipartite graph between 
the vertices corresponding to regions i and j will then be 
assigned the weight Wjj = 1 - D"'. Figure [d] illustrates an 
example of how an edge weight is calculated. 

Once the graph is constructed, we apply the Hungarian 
(Kuhn-Munkres) bipartite graph matching algorithm to find 
a minimal matching ll36l . We take this matching as the cor¬ 
respondence between the regions of the two segmentations. 
We use this correspondence to relabel the automated seg¬ 
mentation. 

When \L\ \ > \Lq\, it may be the case that some regions 
were oversegmented, and multiple regions from the auto¬ 
mated segmentation all correspond to the same region in 
the ground truth. Therefore, we may wish to merge regions 
together to obtain a more optimistic estimate of the DSC. If 
this is the case, after the initial matching let and U be the 
regions of the automated segmentation that are matched or 
unmatched to regions in the ground truth, respectively. We 
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Figure 8: (Colour Figure) Two segmentations of the same im¬ 
age and the correspondences established using our approach. Blue 
lines indicate regions matched between the segmentations using 
the bipartite graph and green lines indicate which matched region 
unmatched regions were added to. 

want to add each of the regions from U to one of the regions 
from . Given an unmatched region from U (denoted i u ), 
for each pair of matched regions from (JM x Ln) (denoted i m 
and j m ) we will create a new auxiliary binary segmentation 
pA{i u ,i m } ky t rea ting the combination of regions i u and i m as 
the foreground and all other regions in the automated seg¬ 
mentation as the background. Specifically, the probabilities 
of p A { l “^ l ’n} a t a pixel x are given by 

p? {i “’ im} (x)=pi(x)+pt(x) (15) 

A {iu ’ im} {x)= £ pj(x). (16) 

leLi 

We will then decide if adding i u to i m improves the matching 
with j m by taking the difference 

K( im ) = D cts (p A{iu ' im \p GT{jm} )- 

D cts ^pA{i m } pGT{j m }^ _ ( 17 ) 

Once we have calculated 8for each matched region we 
will permanently add i u to the region given by 

argmax 5, u (( m ). (18) 

im 


1 Using either D^ s or D% s depending on the application. 









i.e. the region whose matching will be improved the most 
by the addition of i u . We then update = fJV[Ui m , and 
11 = ll\i m and repeat until 11 = %. A similar secondary 
matching phase may be used if \L\ < \Lf\ and the au¬ 
tomated segmentation is thought to have under-segmented 
some regions. 

Figure [8] shows a result of our region matching tech¬ 
nique. All 6 regions from the bottom segmentation are 
matched to the region from the top segmentation with which 
they have the highest DSC. For example, the DSC between 
region 4 in the bottom segmentation and region 6 in the top 
segmentation is 0.9370, whereas region 4 from the bottom 
has DSC less than 0.75 with all other regions from the top 
segmentation. Regions 2 and 4 in the top segmentation were 
not matched using the bipartite graph, and it was found that 
when they were both added to region 1 in the top segmen¬ 
tation they improved the DSC with region 1 in the bottom 
segmentation from 0.8757 to 0.9329. 

Combining the techniques introduced in this section with 
the extended DSC function ([3]) enables the comparison of 
any two multi-region probabilistic segmentations, even if 
they have different numbers of regions or if one is crisp and 
the other probabilistic. 

3. Conclusions 

Validation is crucial for automated and semi-automated 
segmentation methods, and this can be achieved by compar¬ 
ing the resulting segmentations with a ground truth segmen¬ 
tation. Such comparisons are often done using the DSC, but 
this method only applies to crisp binary segmentations. We 
have motivated the importance of multi-region probabilis¬ 
tic segmentations, and thus the importance of extending the 
DSC to compare such segmentations. We have achieved 
this goal by proposing two different extensions with dif¬ 
ferent qualities that allow the comparison of probabilistic 
or crisp segmentations with any number of regions in one 
framework. We have shown how to establish label corre¬ 
spondences between segmentations, even when they have 
different numbers of regions, so that the DSC can be ap¬ 
plied. This work greatly extends the usability of the DSC 
and provides a seamless comparison metric across a wide 
variety of segmentations (e.g. crisp and probabilistic; bi¬ 
nary and multi-region; and differing number of regions, 
with and without ordered labels). 
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